[multi-gpu] Phase 1: namespace channel_type, add cross-rank attrs, doc plan#1576
Merged
erwei-xilinx merged 6 commits intoMay 6, 2026
Conversation
This was referenced May 3, 2026
abbc586 to
38b7e10
Compare
erwei-xilinx
added a commit
to erwei-xilinx/mlir-air-erwei
that referenced
this pull request
May 6, 2026
Apply clang-format-17 reflow to three .cpp files (text-string wrapping across the renamed channel_type values "npu_mmio" / "npu_cascade" / "npu_dma_stream") and black reformat to one .py file (npu_cascade arg list now exceeds the line limit). These were reported by the lintAndFormat workflow on PR Xilinx#1576; this commit folds them into Phase 1 so the diff CI saw is what's now in tree. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Contributor
There was a problem hiding this comment.
Pull request overview
This PR is Phase 1 of multi-GPU messaging support by extending the AIR IR surface: it namespaces existing NPU channel types, adds a GPU-specific symmetric-heap channel type, and introduces cross-rank addressing attributes plus an air.symmetric allocation marker, along with corresponding verifier rules, tests, and documentation updates.
Changes:
- Renames existing
channel_typevalues tonpu_*to make backend scope explicit. - Adds
gpu_symmetric_heapchannel type (rank-scoped) and cross-ranksrc_rank/dst_rankattributes onair.dma_memcpy_ndgated byair.rank+air.symmetric. - Updates verifier logic, MLIR tests, examples, and compute model documentation to cover the new IR surface.
Reviewed changes
Copilot reviewed 45 out of 45 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| programming_examples/matrix_vector_multiplication/bf16_cascade/matvec_cascade.py | Updates cascade channel example to use npu_cascade. |
| programming_examples/herd_dataflow/run.py | Updates default and cascade channel_type strings to npu_*. |
| programming_examples/herd_dataflow/air.mlir | Updates channel declarations/comments to npu_* naming. |
| programming_examples/flash_attention/kernel_fusion_based/attn_npu2.py | Renames cascade channels to npu_cascade. |
| programming_examples/flash_attention/kernel_fusion_based/attn_npu1.py | Renames cascade channels to npu_cascade. |
| programming_examples/flash_attention/dataflow_based/attn.py | Renames cascade channel attribute to npu_cascade. |
| programming_examples/channel_examples/mmio/mmio.py | Renames mmio channel type to npu_mmio and updates docstring. |
| programming_examples/channel_examples/dual_herd_packet_switch/dual_herd_packet_switch.py | Updates comment to refer to npu_dma_packet. |
| programming_examples/channel_examples/channel_3d_segment_unroll/channel_3d_segment_unroll.py | Renames cascade channel to npu_cascade and reformats call. |
| programming_examples/cascade_reduction/cascade_reduction.py | Renames cascade channel to npu_cascade. |
| mlir/test/Transform/AIRMiscPasses/air_split_l2_memref.mlir | Updates FileCheck expectations to npu_dma_packet. |
| mlir/test/Transform/AIRMiscPasses/air_collapse_herd.mlir | Updates cascade channel types to npu_cascade. |
| mlir/test/Transform/AIRHerdPlacement/cascade_placement.mlir | Updates cascade channel declarations to npu_cascade. |
| mlir/test/Transform/AIRDmaToChannel/dma_to_channel_no_auto_packet.mlir | Updates negative checks to npu_dma_packet. |
| mlir/test/Transform/AIRDmaToChannel/dma_to_channel_auto_packet.mlir | Updates expected upgraded channel types to npu_dma_packet. |
| mlir/test/Transform/AIRDmaToChannel/dma_to_channel_auto_packet_single_herd.mlir | Updates expected upgraded channel types to npu_dma_packet. |
| mlir/test/Transform/AIRDmaToChannel/dma_to_channel_auto_packet_broadcast.mlir | Updates broadcast upgrade expectations to npu_dma_packet. |
| mlir/test/Transform/AIRDependencyScheduleOpt/fuse_channels.mlir | Updates stream/packet channel types to npu_* for non-fusion test. |
| mlir/test/Dialect/AIR/air_memcpy_invalid.mlir | Adds verifier-negative tests for cross-rank src_rank/dst_rank and missing air.symmetric. |
| mlir/test/Dialect/AIR/air_cross_rank_dma.mlir | New round-trip tests for cross-rank DMA attrs, air.symmetric, and gpu_symmetric_heap. |
| mlir/test/Dialect/AIR/air_channel.mlir | Updates channel type round-trips and adds gpu_symmetric_heap parse/print coverage. |
| mlir/test/Dialect/AIR/air_channel_invalid.mlir | Updates allowlist diagnostic and adds verifier negatives for gpu_symmetric_heap outside air.rank. |
| mlir/test/Dialect/AIR/air_canonicalize.mlir | Updates cascade channel type to npu_cascade in canonicalization test. |
| mlir/test/Conversion/ConvertToAIR/scf_parallel_to_herd.mlir | Updates cascade channel check to npu_cascade. |
| mlir/test/Conversion/AIRToAIE/shim_pkt_channel_sharing.mlir | Updates packet channels to npu_dma_packet. |
| mlir/test/Conversion/AIRToAIE/shim_packet_flow_npu.mlir | Updates packet channel types to npu_dma_packet. |
| mlir/test/Conversion/AIRToAIE/shared_shim_channel_packet_ids.mlir | Updates packet channel declarations to npu_dma_packet. |
| mlir/test/Conversion/AIRToAIE/segment_unroll_packet_flow_ids.mlir | Updates intra-device packet channels to npu_dma_packet. |
| mlir/test/Conversion/AIRToAIE/good_shim_packet_flow_npu_4col.mlir | Updates packet channel to npu_dma_packet. |
| mlir/test/Conversion/AIRToAIE/bad_shim_packet_flow_npu_1col.mlir | Updates packet channel to npu_dma_packet. |
| mlir/test/Conversion/AIRToAIE/air_shimcpy_to_npu.mlir | Updates multiple packet channel types to npu_dma_packet. |
| mlir/test/Conversion/AIRToAIE/air_channel_to_locks_core_to_core.mlir | Updates cascade channel declarations to npu_cascade. |
| mlir/test/Conversion/AIRToAIE/air_channel_mmio.mlir | Updates mmio tests to npu_mmio and stream default to npu_dma_stream. |
| mlir/test/Conversion/AIRToAIE/air_channel_mmio_invalid.mlir | Updates mmio-negative diagnostics and channels to npu_mmio. |
| mlir/lib/Util/Util.cpp | Changes default inferred channel type to npu_dma_stream. |
| mlir/lib/Transform/AIRMiscPasses.cpp | Updates cascade detection to npu_cascade. |
| mlir/lib/Transform/AIRLinalgCodegen.cpp | Updates generated channel default to npu_dma_stream. |
| mlir/lib/Transform/AIRHerdPlacementPass.cpp | Updates cascade channel collection to npu_cascade. |
| mlir/lib/Transform/AIRDmaToChannel.cpp | Updates created/upgraded channel types to npu_* and mmio exclusion to npu_mmio. |
| mlir/lib/Dialect/AIR/IR/AIRDialect.cpp | Adds cross-rank DMA verification, enforces rank-scope for gpu_symmetric_heap put/get, and updates channel_type allowlist to namespaced values. |
| mlir/lib/Conversion/ConvertToAIRPass.cpp | Updates cascade channel creation to tag npu_cascade. |
| mlir/lib/Conversion/AIRToAIESchedulingUtils.cpp | Updates internal resource type strings to npu_* and mmio handling to npu_mmio. |
| mlir/lib/Conversion/AIRToAIEPass.cpp | Updates mmio gating and resource-type branching to npu_* names. |
| mlir/include/air/Dialect/AIR/AIR.td | Adds src_rank/dst_rank attrs, changes default channel_type to npu_dma_stream, and documents gpu_symmetric_heap. |
| docs/AIRComputeModel.md | Documents cross-rank DMA attrs, namespaced channel types, and air.symmetric attribute; updates summary tables accordingly. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
erwei-xilinx
added a commit
to erwei-xilinx/mlir-air-erwei
that referenced
this pull request
May 6, 2026
Six Copilot comments on PR Xilinx#1576: 1. AIRToAIESchedulingUtils.cpp: four diagnostic strings still said "dma_stream / dma_packet" after the rename to "npu_dma_stream / npu_dma_packet". Updated. 2. docs/AIRComputeModel.md (cross-rank DMA, §2.4): said the GPU backend lowers src_rank/dst_rank, contradicting the summary table that calls it "planned". Reworded as "planned: air-cross-rank-dma- to-mgpu" to match. 3. docs/AIRComputeModel.md (air.symmetric, §2.7): same inconsistency for mgpuSymmetricAlloc routing. Reworded as "planned: air-symmetric-alloc-to-mgpu". 4. AIR.td (DmaMemcpyNdOp description): same inconsistency. Reworded. 5. AIR.td (gpu_symmetric_heap channel_type description): claimed "Lowered by air-to-rocdl to thread-cooperative loops..." with no such lowering yet in tree. Reworded as "planned: air-gpu-channel-to-mgpu". 6. AIRDialect.cpp DmaMemcpyNdOp::verify: rank indices are non-negative. Added explicit `>= 0` check, plus matching verifier- negative tests in air_memcpy_invalid.mlir for both src_rank=-1 and dst_rank=-3. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…c plan
Step toward multi-GPU messaging support per docs/MultiGPUPlan.md. Pure IR/dialect
changes — no lowering yet.
## channel_type namespace rename (Option 1)
Existing channel_type values gain a `npu_` prefix to make backend scope explicit:
- `dma_stream` → `npu_dma_stream` (default)
- `dma_packet` → `npu_dma_packet`
- `cascade` → `npu_cascade`
- `mmio` → `npu_mmio`
Mechanical rename across 33 files (verifier, transform/conversion passes, all
.mlir tests, Python programming examples).
## New channel_type for GPU multi-rank messaging
- `gpu_symmetric_heap`: cross-rank channel through the symmetric heap runtime
(runtime_lib/airgpu/symmetric_heap.{h,cpp}). Verifier requires put/get sites
to be inside an `air.rank` scope.
## air.dma_memcpy_nd cross-rank addressing
- New optional integer attributes `src_rank` / `dst_rank` name a peer rank in
the enclosing `air.rank` scope.
- Verifier requires:
- an enclosing `air.rank` scope
- the peer-side memref's `memref.alloc` (when directly available) to carry
the `air.symmetric` attribute
- Backward-compatible builder so existing call sites compile unchanged.
## air.symmetric memref attribute
A unit attribute on `memref.alloc` indicating the allocation is backed by the
symmetric heap. Documented in docs/AIRComputeModel.md §2.7.
## Documentation
- New docs/MultiGPUPlan.md: full design and 7-phase implementation plan
- docs/AIRComputeModel.md: §2.4 cross-rank addressing, §2.7 air.symmetric,
§2.5 channel_type table updated, §5 summary table updated
## Tests
- mlir/test/Dialect/AIR/air_cross_rank_dma.mlir (new): positive round-trip
for src_rank/dst_rank, air.symmetric memref, gpu_symmetric_heap channel
put/get inside air.rank
- mlir/test/Dialect/AIR/air_channel_invalid.mlir: gpu_symmetric_heap
put/get outside air.rank rejected; updated unsupported channel_type
error message
- mlir/test/Dialect/AIR/air_memcpy_invalid.mlir: src_rank/dst_rank
outside air.rank rejected; missing air.symmetric on alloc rejected
All 21 mlir/test/Dialect/AIR/ tests pass; GPU dma_copy and 4k_4k_mul e2e
tests pass on MI300A.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Apply clang-format-17 reflow to three .cpp files (text-string wrapping across the renamed channel_type values "npu_mmio" / "npu_cascade" / "npu_dma_stream") and black reformat to one .py file (npu_cascade arg list now exceeds the line limit). These were reported by the lintAndFormat workflow on PR Xilinx#1576; this commit folds them into Phase 1 so the diff CI saw is what's now in tree. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Six Copilot comments on PR Xilinx#1576: 1. AIRToAIESchedulingUtils.cpp: four diagnostic strings still said "dma_stream / dma_packet" after the rename to "npu_dma_stream / npu_dma_packet". Updated. 2. docs/AIRComputeModel.md (cross-rank DMA, §2.4): said the GPU backend lowers src_rank/dst_rank, contradicting the summary table that calls it "planned". Reworded as "planned: air-cross-rank-dma- to-mgpu" to match. 3. docs/AIRComputeModel.md (air.symmetric, §2.7): same inconsistency for mgpuSymmetricAlloc routing. Reworded as "planned: air-symmetric-alloc-to-mgpu". 4. AIR.td (DmaMemcpyNdOp description): same inconsistency. Reworded. 5. AIR.td (gpu_symmetric_heap channel_type description): claimed "Lowered by air-to-rocdl to thread-cooperative loops..." with no such lowering yet in tree. Reworded as "planned: air-gpu-channel-to-mgpu". 6. AIRDialect.cpp DmaMemcpyNdOp::verify: rank indices are non-negative. Added explicit `>= 0` check, plus matching verifier- negative tests in air_memcpy_invalid.mlir for both src_rank=-1 and dst_rank=-3. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The previous commit (888bcaa) added a `>= 0` verifier on src_rank / dst_rank, but used `getSrcRank()` / `getDstRank()` — those return `std::optional<uint64_t>` (a TableGen quirk for `OptionalAttr<I64Attr>`), so `*sr < 0` on the unsigned value is always false and the check never fired. The two new verifier-negative tests in air_memcpy_invalid.mlir silently regressed. Switch to the typed `getSrcRankAttr()` / `getDstRankAttr()` accessors which return `IntegerAttr`, then call `.getInt()` for a real `int64_t`. The check now fires on negative values; both negative-rank tests pass under `lit -sv ../../mlir/test/Dialect/AIR` (21/21). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
origin/main grew 5 new herd-placement tests via Xilinx#1583 that use the pre-rename `channel_type = "cascade"`. After this PR's namespace rename ("cascade" -> "npu_cascade"), those tests fail under air-opt with the verifier rejecting the old name. Update them to "npu_cascade" so they keep passing on top of phase 1. Verified on rad-mi300a-sh5-1: AIRHerdPlacement 15/15 pass, Dialect/AIR 21/21 pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
90c90d6 to
965f853
Compare
CI on 'Build and Test with AIE tools on Ryzen AI (amdhx370)' caught one more stale "cascade" reference: test/xrt/34_cascade_vecadd/run_peano.py embeds an inline MLIR string that declared `channel_type = "cascade"`. Update to "npu_cascade" to match the namespace rename. The corresponding run_chess.py variant didn't have this issue. Verifier diagnostic from the failing job: 'air.channel' op unsupported channel_type "cascade"; expected one of "npu_dma_stream", "npu_dma_packet", "npu_cascade", "npu_mmio", or "gpu_symmetric_heap" Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
First step toward multi-GPU messaging support. Pure IR/dialect changes — no lowering yet (Phases 2–7 land separately as #1577–#1582).
Summary
channel_typenamespace rename (Option 1)Existing values gain a
npu_prefix to make backend scope explicit:dma_stream(default)npu_dma_streamdma_packetnpu_dma_packetcascadenpu_cascademmionpu_mmioMechanical rename across 33 files (verifier, transform/conversion passes, all
.mlirtests, Python programming examples).New GPU multi-rank channel type
gpu_symmetric_heap: cross-rank channel through the symmetric heap runtime (runtime_lib/airgpu/symmetric_heap.{h,cpp}). Verifier requires put/get sites to be inside anair.rankscope.air.dma_memcpy_ndcross-rank addressingsrc_rank/dst_rankinteger attributes name a peer rank in the enclosingair.rankscope.air.rankscopememref.alloc(when directly available) to carry theair.symmetricattributeair.symmetricmemref attributeA unit attribute on
memref.allocindicating the allocation should be backed by the symmetric heap. Documented in docs/AIRComputeModel.md §2.7.Documentation
docs/AIRComputeModel.md updated to describe the new IR surface:
air.dma_memcpy_ndchannel_typetable including thenpu_*rename andgpu_symmetric_heapair.symmetricmemref attributeTest plan
mlir/test/Dialect/AIR/tests pass (positive round-trip + verifier negatives)air_cross_rank_dma.mlir: round-trip forsrc_rank/dst_rank,air.symmetricmemref,gpu_symmetric_heapchannel insideair.rankair_channel_invalid.mlir:gpu_symmetric_heapput/get outsideair.rankrejectedair_memcpy_invalid.mlir:src_rank/dst_rankoutsideair.rankrejected, missingair.symmetricon alloc rejected🤖 Generated with Claude Code